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ABSTRACT 

Motivation: Systems biology demands the use of several point of 
views to get a more comprehensive understanding of biological prob- 
lems. This usually leads to take into account different data regarding 
the problem at hand, but it also has to do with using different per- 
spectives of the same data. This multifaceted aspect of systems biol- 
ogy often requires the use of several tools, and it is often hard to get a 
seamless integration of all of them, which would help the analyst to 
have an interactive discourse with the data. 

Results: Focusing on expression profiling, BicOverlapper 2.0 visual- 
izes the most relevant aspects of the analysis, including expression 
data, profiling analysis results and functional annotation. It also inte- 
grates several state-of-the-art numerical methods, such as differential 
expression analysis, gene set enrichment or biclustering. 
Availability and implementation: BicOverlapper 2.0 is available at: 
http://vis.usal.es/bicoverlapper2 
Contact: rodri@usal.es 
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1 INTRODUCTION 

BicOverlapper 1.0 (Santamaria et aL, 2008) focused on the visu- 
alization of complex gene expression analysis results coming 
from biclustering algorithms. Based on Venn-like diagrams and 
overlapping visualization layers, it successfully conveyed biclus- 
ters. With the use of BicOverlapper by the authors and third- 
party users, several new requirements arose, and it has evolved to 
support other analysis techniques and additional steps of the 
analysis process. Similar evolutions have occurred on other 
tools on the field. For example. Expander has extended micro- 
array data analysis with relational and functional information 
(Ulitsky et aL, 2010). Hierarchical Clustering Explorer, although 
originally designed for general use, added new methods for bio- 
informatics analysis (Seo et aL, 2006). Treeview (Saldanha, 2004) 
is developing toward a new version that will address high- 
throughtput biology needs (see https://www.princeton.edu/ 
^abarysh/treeview/). 

2 APPROACH 

During the design of BicOverlapper 2.0, we focused on a high level 
of interaction and a visual analytics (Thomas and Cook, 2005) 
approach. Another important design principle was the simplifica- 
tion of installation and interfaces. Finally, following the original 
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'overlapping' philosophy, we designed linked visualizations and 
an agglomerative use of standard numerical analyses. For ex- 
ample, differential expression analysis compares two experimental 
conditions, but BicOverlapper 2.0 allows to compare several com- 
binations of experimental conditions at once and then to visuahze 
the relationships between the differentially expressed groups. 

3 METHODS 

The tool is implemented as two interconnected layers: visualization and 
analysis. The analysis layer is R/Bioconductor-dependent, using several 
packages and ad hoc scripts. Data retrieval from Gene Expression 
Omnibus (GEO) and ArrayExpress is supported by its corresponding 
packages (Davis and Meltzer, 2007; Kauffmann et al., 2009), although 
it requires high bandwith and not all of the experiments are supported. 
Data analysis includes the following: 

• Differential expression with limma (Smyth, 2005). In addition to one- 
to-one comparisons, BicOverlapper allows to perform multiple com- 
parisons at once, visualized as intersecting differentially expressed 
groups. This way, analysis time is reduced, and the differences be- 
tween the comparisons can be inspected. 

• Gene set enrichment analysis is also implemented via GSEAlm (Oron 
and Gentleman, 2008). Enriched gene sets are visualized as overlap- 
ping groups. 

• Biclustering, as in the previous version, is computed with biclust 
(Kaiser et aL, 2013) package. The Iterative Search Algorithm (ISA) 
algorithm is now also available by the isa2 package. 

• Correlation networks. This is a simple yet powerful method to find 
groups. Genes with low overall expression variation are filtered out, 
and the rest are linked if they have a profile distance below some 
standard deviations. The resulting network is visualized as a force- 
directed layout, where nodes can be colored by the expression under 
selected conditions. 

The visualization layer is developed in Java and it communicates with 
the analysis layer via rJava (Urbanek, 2007). This layer contains several 
visualization techniques, with implementations based on Prefuse (Heer 
et aL, 2005) (networks, scatterplots). Processing (Reas and Fry, 2007) 
(overlapper, heatmap) and plain Java (parallel coordinates, word clouds). 

4 RESULTS 

To involve biology specialists on bioinformatics analyses, 
we need simpler and highly interactive tools. For example, 
Figure 1 was generated only by clicking two menu options 
and selecting one visual item and gene/condition labels, on a 
process that takes not more than 5min (see Supplementary 
Video at http://vis.usal.es/bicoverlapper2/docs/tour.mp4). 
Underneath, this requires the seamless connection of different 
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Fig. 1. Yeast gene expression profile along three cell cycles, from experiment GSE3431 (Tu et al., 2005). Each cell cycle is divided into three time intervals 
(early, mid and late). Differential expression for every combination of such intervals is computed and visualized as overlapping groups. Thirty-six genes 
high-regulated at early and mid intervals have been selected (intersection between 'early versus late' and 'mid versus late' groups at the bottom left); their 
expression profiles are shown in parallel coordinates and heatmap visualizations. Finally, the functional annotations, stacked by term, are shown as a 
word cloud, indicating, for example, that 9 of the 36 genes are related to metabolic and oxidation-reduction processes 



steps: expression data loading, computation of distribution 
statistics, three differential expression analyses (for up- and 
downregulation), gene annotation retrieval and the visualization 
of four interactive representations. 

Figure 1 provides a considerable amount of information about 
the experiment. First, parallel coordinates (Inselberg, 2009) indi- 
cate with boxplots that the data are normalized, although prob- 
ably skewed towards upregulation. Second, differential 
expression groups, displayed as Venn diagrams, present a large 
overlap for genes upregulated at mid and early timepoints with 
respect to late timepoints. These intersecting genes have a clear 
pattern under heatmap and parallel coordinates and include nine 
genes related to the Gene Ontology (GO) terms 'oxidation-re- 
duction process' and five related to 'fatty acid beta-oxidation'. 

5 CONCLUSION 

BicOverlapper is a simple-to-use, highly visual and interactive 
tool for gene expression analysis. Easily and without program- 
ming knowledge, the user can have an overall view of several 
expression aspects, from raw data to analysis results and func- 
tional annotations. This may significantly reduce the analysis 
time and improve the analytical discourse with the data. For 
the future, we are working on the support of high-throughput 
data, especially RNA-Seq and a comprehensive report and image 
generation. 
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